Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes

نویسندگان

  • Fereydoun Hormozdiari
  • Can Alkan
  • Evan E. Eichler
  • Süleyman Cenk Sahinalp
چکیده

Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous structural variation discovery among multiple paired-end sequenced genomes.

With the increasing popularity of whole-genome shotgun sequencing (WGSS) via high-throughput sequencing technologies, it is becoming highly desirable to perform comparative studies involving multiple individuals (from a specific population, race, or a group sharing a particular phenotype). The conventional approach for a comparative genome variation study involves two key steps: (1) each paired...

متن کامل

Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery

UNLABELLED Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational metho...

متن کامل

Combinatorial Algorithms for Transposon Insertion Discovery

Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have bee...

متن کامل

Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

As whole genome shotgun sequencing (WGSS) becomes more accessible using high-throughput sequencing technologies, undertaking comparative studies among different individuals (based on population, race, or genetic disease) is the next logical step. In this paper, we propose a paradigm shift in variation comparative studies (specifically structural variation) away from the conventional two step ap...

متن کامل

Genome variation discovery with high-throughput sequencing data

The advent of high-throughput sequencing (HTS) technologies is enabling sequencing of human genomes at a significantly lower cost. The availability of these genomes is hoped to enable novel medical diagnostics and treatment, specific to the individual, thus launching the era of personalized medicine. The data currently generated by HTS machines require extensive computational analysis in order ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 19 7  شماره 

صفحات  -

تاریخ انتشار 2009